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Technical  Report 


Executive  Summary 

For  geometric  based  object  recognition  problems  the  current  best  approach  is 
to  use  a  “Point  to  Surface”,  or  more  generically  speaking  a  “Point  to  Complex 
Model” ,  matching  algorithm  which  we  have  developed  over  the  last  five  years  of 
this  program.  This  approach  is  very  general  addressing  many  recognition  prob¬ 
lems  beyond  simple  target  recognition  problems  with  their  standard  CAD  model 
representation.  A  CAD  model  representation  of  an  object  is  a  special  instance 
of  a  complex  model  representation.  A  complex  is  a  mathematical  structure 
composed  of  simplices.  A  point  is  a  0-simplex,  a  line  a  1-simplex,  a  triangle  a 
2-simplex,  a  tetrahedron  a  3-simplex,  and  so  forth. 

In  most,  if  not  all,  target  recognition  problems  the  Complex  Model  approach 
provides  a  representation  of  the  object  as  a  composite  (or  gluing  together)  of 
many  separate  parts  (generally  surfaces  or  lines).  This  model  is  then  to  be 
matched  against  given  image  data.  By  using  barycentric  coordinates  to  repre¬ 
sent  points  on  each  simplex  the  point  to  point  matching  problem  can  be  avoided 
(or  at  least  greatly  alleviated).  This  correspondence  problem  has  plagued  the 
recognition  community  for  too  long  because  we  have  viewed  it  strictly  as  a  dis¬ 
crete  combinatorial  problem.  By  embedding  the  discrete  combinatorial  problem 
into  the  reals  this  issue  is  more  effectively  solved. 

This  research  actually  goes  beyond  the  specific  representation  of  the  object. 
To  understand  this  note  that  there  are  really  three  components  of  any  matching 
problem  that  can  and  should  be  considered  separately: 

1.  the  object  and  image  representation, 

2.  the  generic  optimization  model  involving  the  representations,  and 

3.  the  algorithm  employed  to  solve  the  model. 
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The  recognition  of  this  decomposition  of  the  problem  should  not  be  under¬ 
estimated  as  parts  are  interchangeable.  To  emphasize  this  aspect  we  have  ad¬ 
dressed  the  solution  of  the  Rubiks  Cube  problem.  For  the  Rubiks  Cube  problem 
the  object  and  image  use  a  simple  (non  Complex  Model)  representation  in  terms 
of  corners,  edges,  and  orientations.  The  specifics  can  be  found  in  the  DASP  2009 
and  SPIE  2009  papers.  But  in  both  problems,  target  recognition  and  Rubiks 
Cube,  the  generic  optimization  model  and  algorithm  used  to  solve  the  model 
arei  identical.  It  is  essentially  a  Least  Squares  Model  using  the  Frobenious  norm 
which  is  equivalent  to  “maximal  correspondence”  problem.  This  optimization 
model  is  a  multilinear  programming  problem  with  a  multilinear  objective  func¬ 
tion  and  decoupled  linear  constraints.  Because  of  its  special  structure  it  can  be 
solved  relatively  efficiently  (compared  with  past  methods). 

The  biggest  shortcoming  of  our  research,  and  the  current  state  of  the  art 
in  recognition,  is  that  we  have  not  been  able  to  develop  a  convex  optimization 
model  and  as  such  there  is  the  pitfall  that  the  optimization  algorithm  can  re¬ 
turn  a  local  minimum  rather  than  a  global  minimum.  For  target  recognition 
problems  this  shortcoming  does  not  have  the  same  impact  as  the  more  general 
problem  like  Rubiks  Cube  because  further  data  can  be  employed  to  improve 
the  probability  of  obtaining  a  global  match.  (Man  made  objects  tend  to  have 
further  structure  which  can  be  exploited  for  improving  the  probability  of  correct 
recognition.)  We  feel  strongly  that  any  further  progress  into  recognition  must 
address  this  local/global  issue  by  a  paradigm  shift  -  moving  away  from  a  least 
squares  based  model  (or  any  of  its  equivalent  representations  such  as  the  maxi¬ 
mal  correspondence  model).  Any  nonconvex  optimization  model  will  be  plagued 
by  this  issue;  there  is  no  algorithm  that  can  quarantee  a  global  minimum  for 
such  models.  Here  we  note  that  some  authors  claim  to  have  made  significant 
progress  on  this  global  optimization.  In  our  review  of  the  literature  nothing 
could  be  further  from  the  truth.  Methods  that  rely  on  a  branch  and  bound  pro¬ 
cedure  (or  box  and  bound,  octree’s  etc.)  all  rely  on  the  model  being  matching 
n  points  to  n  points  where  the  translation  component  can  be  easily  eliminated. 
Then  by  using  Lipshitz  Theory  can  use  discretization  of  the  unknown  rotations 
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and  put  bounds  on  how  much  the  objective  function  can  change  in  a  small  box. 
This  effectively  eliminates  the  rotations  variables  (by  discretizing  over  them  and 
searching  over  all  possibilities).  At  this  point  the  only  unknown  is  the  assign¬ 
ment  correspondence  variables  so  the  model  is  a  simple  linear  programming 
problem  which  has  a  global  minimum  (though  the  argument  of  the  minimum 
need  not  be  unique).  This  whole  approach  collapses  when  one  is  trying  to  match 
n  points  with  m  points  (or  m  points  to  n  surfaces).  The  translation  variables 
cannot  be  removed  and  the  problem  is  4-linear  which  does  not  have  a  unique 
local  minimum  (hence  global  minimum). 

An  understanding  of  the  theory  underlying  the  model  suggest  a  relationship 
to  fundamental  ideas  of  quantum  mechanics  -  for  example  the  idea  of  a  super¬ 
position  of  possibilities  which  the  model  employs.  We  note  that  this  is  really 
just  an  interpretation  with  respect  to  the  variables  which  can  be  interpretted  as 
probabilities1.  However  this  analogy  should  not  be  disregarded;  Shroedingers 
equation,  which  is  a  model  for  the  position  of  a  particle,  has  many  solutions 
depending  upon  the  spectrum  of  the  operator  (for  the  specific  problem  model) . 
These  many  solutions  (or  orbits  when  considering  an  electrons  position)  can 
be  analogized  with  the  many  local  minimum  of  a  recognition  problem.  The 
most  probable  solution  corresponds  to  a  minimuml  energy  state  which  would 
be  a  “global  solution” .  Any  such  paradigm  shift  based  upon  this  analogy  would 
replace  the  least  squares  optimization  model  by  an  expression  modeling  the 
problem  -  such  as  an  equation  rather  than  an  objective  function.  An  opti¬ 
mization  algorithm  would  then  not  be  relevant  to  the  solution  but  rather  other 

1This  relates  to  the  reason  for  the  development  of  the  Category  of  Probabilistic  Mappings 
by  William  Lawvere  back  in  the  1960’s.  He  recognized  that  the  deterministic  world  (based 
upon  aristetelian  logic)  was  a  special  case  of  a  more  general  logic.  Fifty  years  later  Lawvere 
is  still  working  on  this  problem  and  attempting  to  axiomatize  physics  based  upon  categories 
more  general  than  the  category  of  sets.  In  particular,  a  topos  can  replace  the  traditional  sets 
upon  which  so  much  of  traditional  mathematics  and  science  is  based.  A  main  interest  of  his 
is  mechanics  and  his  1963  seminar  paper  on  Probabilistic  Mappings  references  the  issues  of 
interest  which  relate  to  recognition.  Indeed  to  talks  directly  about  the  recognition  problem  in 
terms  of  probabilistic  mappings  which  lead  to  our  development  and  understanding. 
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(numerical)  methods  would  come  to  the  forefront.  To  date  our  very  limited 
attempts  at  such  an  approach  using  stochastic  matrices  rather  than  permuta¬ 
tion  (deterministic)  matrices  have  failed  because  of  lack  of  knowledge  on  how 
to  choose  an  appropriate  basis.  This  novel  approach  is  reported  on  since  it 
generalizes  our  current  method  and  in  our  opinion  holds  the  best  probability  of 
future  progress.  Furthermore  it  allows  for  the  replacement  of  the  optimization 
model  by  an  equation.2 


Outline 

We  break  the  report  down  into  applications  and  theory.  For  someone  wanting 
to  know  how  to  use  these  ideas  knowledge  of  the  theory  is  not  necessary;  only 
the  representation  and  modeling  aspects  are  important.  On  the  other  hand,  the 
theory  provides  both  the  motivation,  underpinnings,  and  directions  for  future 
research  (or  at  least  indicates  what  avenues  have  already  been  explored). 

Our  main  application  is  to  recognition  of  targets  -  those  objects  which  can 
be  represented  by  a  complex.  There  are  many  offshoots  and  add  ons  which  can 
be  exploited  for  this  specific  type  of  problem.  For  example  one  can  generalize 
the  model  by  using  an  Image  Point  Spread  Model  which  allows  for  uncertainty 
in  the  image  data,  or  one  can  weight  the  image  points  representing  the  fact  that 
we  may  have  confidence  in  some  points  more  than  others.  There  are  many  such 
generalizations  to  the  core  problem  and  we  enumerate  them  with  their  models. 

There  are  two  approaches  to  addressing  the  theory.  The  motivational  ap¬ 
proach  -  how  we  can  to  this  methodology  based  upon  basic  ideas  from  cate¬ 
gory  theory  which  view  deterministic  mappings  as  a  left  adjunct  of  probabilistic 
mappings  and  allows  for  further  sheaf  theoretic  description,  or  the  “simplified” 

2  We  recognize  that  an  equation  can  be  viewed  as  a  particular  instance  of  an  optimization 
model  where  the  error  (objective  function)  assumes  a  value  of  zero.  However  there  is  often  a 
change  of  emphasis  on  the  technique  used  to  solve  an  equation  rather  than  an  optimization 
problem. 
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approach  which  distills  the  problem  down  to  a  basic  least  squares  problem.  We 
address  both  of  these  approaches.  The  sheaf  theoretic  approach  also  describes 
the  Beck-Chevalley  Theorem  applied  to  measures  which  is  a  nice  application. 

0.1  3D-3D  Ladar 

The  3D-3D  registration  problem  (also  known  as  3D-3D  alignment)  is  a  funda¬ 
mental  problem  in  computer  vision.  In  this  problem  two  sets  of  3D  points  are 
given  and  the  task  is  to  optimally  align  these  two  sets  of  points  by  determining  a 
group  transformation  (an  affine,  euclidean,  projective  or  other  transformations 
depending  upon  the  camera  model)  so  as  to  minimize  the  mean  squared  error 
between  the  two  sets  of  points.  The  two  sets  can  be  of  different  sizes,  say  of  sizes 
to  and  n  with  m  <  n  in  which  case  the  subproblem  of  selecting  m  points  from 
the  set  of  n  points  needs  to  be  determined  before  the  alignment  itself  can  be 
determined.  Because  this  problem  involves  two  sets  of  points  it  is  also  referred 
to  as  point  to  point  matching. 

Point  to  surface  matching  generalizes  point  to  point  matching  and  permits 
the  correspondence  between  each  point  composing  a  point  cloud  of  image  data 
with  a  point  on  a  surface  of  a  CAD  model  (or  other  idealized  model) .  The  com¬ 
binatorial  problem  of  matching  m  points  to  n  points  now  becomes  matching 
each  of  the  to  points  in  the  point  cloud  to  a  correspondence  point  on  one  of 
the  n  surfaces.  By  introducing  assignment  (correspondence)  parameters  that 
associate  with  each  image  point  i  the  “probability”  of  matching  with  surface 
component  j,  the  discrete  matching  problem  can  be  embedded  into  the  contin¬ 
uous  domain  and  the  combinatorial  explosion  that  occurs  with  matching  can  be 
significantly  alleviated  from  the  computational  perspective.  Indeed,  for  a  fixed 
group  transformation  this  matching  problem  is  a  linear  programming  problem 
for  point  to  point  matching  and  a  quadratic  programming  problem  for  point 
to  surface  matching.  Conversely,  for  known  correspondences  (point  to  point  or 
point  to  surface)  the  problem  is  a  pure  pose  estimation  problem.  For  this  reason 
the  registration  problem  is  also  called  the  simultaneous  pose  and  correspondence 
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problem. 

The  challenge  in  solving  this  problem  is  to  determine  the  global  optimal 
solution.  Even  for  the  point  to  point  matching  problem  this  is  not  an  easy 
task.  The  most  popular  methods  for  solving  this  problem  are  based  upon  an 
alternating  minimization  procedure  between  solving  the  pose  problem  and  the 
correspondence  problem.  These  methods  are  all  local  methods  as  no  global 
optimum  is  guaranteed.  The  method  based  upon  Lipschitz  optimization3  has 
limited  use  since  it  is  restricted  to  matching  n  points  to  n  points. 

The  approach  discussed  here  is  based  upon  the  discretization  of  the  transfor¬ 
mation  parameters  as  well  so  that  the  problem  becomes  a  multilinear  program¬ 
ming  problem  in  the  correspondence  and  pose  (transformation  group)  param¬ 
eters.  The  theoretical  basis  of  this  model  begins  with  abstracting  the  general 
recognition  problem  as  one  of  a  least  squares  problem. 

0.1.1  Least  Squares  Model 

Define  a  Least  Squares  Error  Recognition  Problem  as  one  which  can  be  modeled 
as  the  determination  involving  the  Frobenius  norm  minimization  of  the  error 
between  an  Object  O ,  an  image  X,  and  a  transformation  group  Q  which  acts  on 
the  image, 


min  \\0  -  gl\\2F.  (1) 

g&Q 

This  model  involves  the  determination  of  the  argument  of  the  minimum 
(argmin)  of  the  minimuml  objective  function  value  which  most  closely  matches 
the  object  with  the  transformed  image.  The  objective  function  is  a  ^-invariant 
(pseudo)  metric  measuring  the  “distance”  between  the  object  and  the  trans¬ 
formed  image.  Many  recognition  problems  can  be  formulated  in  this  fashion, 
and  a  linear  representation  of  the  object,  image,  and  transformation  group  are 
applicable.  Such  is  the  case  for  3D  Euclidean  recognition  problems  where  the 
object  and  image  can  be  given  a  matrix  representation  where  the  columns  (or 
'The  3D  Registration  Problem  Revisited,  Hongdong  Li  and  Richard  Hartley,  ICCV  2007. 
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rows)  of  the  matrices  correspond  to  ordered  image  points  and  points  on  the 
object  are  given  by  barycentric  coordinates  on  the  surface  components. 

Using  assignment  variables  allows  one  to  consider  unordered  collections  of 
points.  To  each  image  point  indexed  by  i  there  corresponds  an  object  point 
indexed  by  j.  So  we  associate  a  variable  Pij  which  assumes  the  value  1  provided 
that  image  point  i  corresponds  to  the  object  point  j  and  pjj  =  0  otherwise. 
Consider  the  2D  point  to  point  matching  problem.  Here  the  optimization  model 
for  matching  unorclered  points  {it,:};  to  object  points  {xj}j  is 


min 

R.esO(2),Te?R2,Pi,j 
subject  to 


E,  Ej  Pij  II  ( Rui  +  T)-Xj  1 1 2 


Ej  Pij  =  1  Vi 
Pij  >  o  Vi,  j 


(2) 


By  the  introduction  of  barycentric  coordinates  associated  with  the  object 


points  \xj }  this  problem  can  be  extended  to  matching  image  points  to  lines, 
surfaces,  or  any  convex  hull  of  a  collection  of  points  by  the  optimization  model 


mm 

fleSO(2),TGSR2,Pi,: 

subject  to 


Ei  T,jPi,MRui  +  T)  -  Efc  OLi,j,kX 


k  112 


E,  Pij  ~  1  V* 

Pij  >  0  Vi,  j 

Efc  =  1  Vi,j 


^  d  Vi,  j,  k 


(3) 


where  the  (Xi.j,k  are  the  barycentric  coordinate  associated  with  the  assign¬ 
ment  of  image  point  u;  to  surface  j  which  has  vertices  {xj, . . . ,  lj  }  where  NVJ 
is  the  number  of  vertices  characterizing  the  convex  hull  defining  surface  j. 

The  challenge  with  this  model,  as  with  all  current  models  that  we  are  aware 
of4,  is  that  because  the  model  is  not  a  convex  optimization  problem  (convex 
4 If  one  restricts  the  problem  to  matching  m  points  to  m  points  then  the  problem  is  signifi¬ 
cantly  simpler  because  the  translation  variables  can  be  removed  from  the  problem.  The  paper 
[?]  uses  Lipschitz  global  optimization  theory  to  address  matching  m  points  to  m  points. 


objective  function  with  convex  constraints)  the  algorithmic  solution  to  this 
problem  cannot  guarantee  a  global  minimum.  The  ability  to  obtain  a  global 
minimum  can  be  significantly  improved  by  taking  a  “probabilistic  perspective” 
of  the  problem  and  exploiting  additional  information  (or  extracting  informa¬ 
tion  from  the  given  data).  Towards  this  end  we  discretize  the  rotation  and 
translation  parameters. 

0.1.2  Discretization  of  the  Transformation  Group  Parameters. 

In  the  model  (1)  suppose  Q  is  a  finite  discrete  transformation  group  which  can  be 
written  as  a  composition  of  subgroupoids,5  Q  —  Q1 . . .  QL.  If  ke  =  \Gl\,  then  for 
each  £,  fix  a  labelling  {g\ .  g%  . . .  ,g[e}  of  Ge.  Let  J  =  {l,...,fci}x---x{l,...,fcF} 
be  an  index  set  so  that  any  j  £  J  corresponds  to  an  element  g-}  =  g^  ■  ■  ■  gb  in 
Q.6  The  set  of  all  possible  values  the  objective  function  || O  —  g1\\p  can  assume 
is 

{ej  =  ||0-.9jT||2F}jeJ.  (4) 

Associate  with  each  error  term  ej  a  weight  p-}  =  p!ri  ■  ■  ■  P^npfL  and  consider  the 
weighted  sum  of  all  these  error  components 

(5) 

je^ 

Since  the  index  j  specifies  the  components  of  the  element  gj  =  gjx  ■  ■  ■  asso¬ 
ciated  with  the  error  ej  there  is  a  bijective  correspondence  between  the  compo¬ 
nents  gj  and  p*  .  Restrict  the  variables  p*  to  satisfy  J2iLiPi  =  1  and  p\  >  0  for 
all  components  £  and  all  element  indices  i.  For  each  Qe,  the  set  { . . . ,  gf.( } 
is  a  finite  sample  space  and  each  vector  •  •  *  >pL)  associated  with  that 

component  can  be  interpreted  as  a  distribution  on  the  (global)  £th  component 
of  an  optimal  transformation  minimizing  the  expression  (5).  Thus  for  Q  a 

5 A  subgroupoid  in  this  setting  means  a  subset  of  Q  which  contains  the  identity  of  Q  and 
inverses  for  each  element,  but  the  binary  operation  of  Q  is  restricted  to  pairs  whose  product 

is  in  the  subset  (the  operation  inherits  associativity,  of  course,  when  defined). 

6The  decomposition  of  elements  g  E  Q  need  not  be  unique.  One  can  have  j,i  E  J  with 

gj  =  gi  and  j  /  i. 
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discrete  group,  (1)  is  equivalent  to  the  optimization  model 


mm 


kg_ 

^2  pi  =  1  (for  all 


&jej 

subject  to 
and 


(6) 


pi  >0  (for  all  l  and  all  i) 


By  the  form  of  the  objective  function  a  global  minimum  can  always  be 
obtained  corresponding  to  a  dirac  probability  measure  at  each  component  -  i.e. 
if  ej0  is  minimuml  over  all  j  £  J,  then  setting  the  components  of  its  coefficient  to 
one  and  all  the  others  to  zero  yields  a  global  minimum. '  Such  a  determination 
of  variables  will  satisfy  the  constraints  and  is  a  dirac  measure. 

For  3D  recognition  problems  the  Euclidean  group  can  be  decomposed  into 
six  linear  transformation  groups  involving  the  rotation  and  translation  groups, 
{Tx,Ty,Tz,  Rx,  Ry,  Rz}.  By  discretizing  each  group,  an  approximation  to  the 
least  squares  problem  (1)  model  in  the  form  of  (6)  can  be  obtained. 


0.1.3  The  Discretized  Least  Squares  Model  with  Assignment  Corre¬ 
spondence 

The  2D  point  to  surface  model  with  discretized  transformation  group  parameters 
can  be  written  as 


min 

Pn>Pp’P%’Pi,j’ai,j-k 


J2J2J2J2  \\iR«ui  +  T0n)  -  Efc  ai,j,kXj\\2 

i  j  a  0  7 


subject  to 


Pij  = 1  Vi  p»  = 1  E/s  px0  = 1  E7  pv7  = 1 

Pij  >  0  Vi,j  Pa  >0  Va  pp>  0  V/3  p^>  0  Vy 
Efc  =  1  Vi,  j 


az,j,k  >  OVi,  j,  k 
where  and  {Tp^p^  = 


(7) 


xp 

Vi 


are  both  a  set  of  constant  values. 


/3,7 


7The  minimum  objective  function  value  may  not  have  a  unique  argmin. 
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The  computational  effort  associated  with  this  model  is  greatly  simplified 
by  expanding  the  L2  norm  in  terms  of  the  inner  product,  ||x||2  =  ( x,x ),  and 
bringing  the  summation  terms  inside  wherever  possible.  By  exploiting  the  fact 
that  the  variables  satisfy  YlaPa  =  1)  etc.,  and  centering  the  image  data  the 
above  objective  function  simplifies  to 


Ni 


Ni 


Ui)  2 ^  Ra)uii  'y  '  y  '  Pi,jai,j,kxj) 


i— 1 


-21 


Ni 


Eu pxpxp  I , y  y y Pi,jai,j,kxlj) + I Y pxppyi (4 + y2j)  (8) 

z^7P72/7  J  i=i  j  k  py 


'5~'jPi.i(51k  ai,j,kxj,  Y^ik1  ai,j,k'xj  ) 

*= 1  j 

The  computation  of  the  barycentric  coordinates  is  where  all  the  computation 
time  is  required.  As  the  distributions  for  each  image  point  u,  converge,  •  — >  6 
so  the  distribution  has  components  with  value  0,  the  computations  become 
simple.  Furthermore  the  decoupled  constraints  makes  the  computation  for  the 
projection  of  the  gradients  onto  the  nullspace  very  efficient  for  an  optimization 
algorithm.  The  extension  of  this  model  to  3D  matching  is  straightforward.  Some 
typical  3D  results  are  shown  in  section  0.1.6. 


0.1.4  The  Maximal  Correspondence  Model 

Going  back  to  the  notation  of  model  (1)  and  it’s  discretization  in  section  0.1.1 
consider  the  notation  (A,B)p  =  Tr(ABT).  By  using  the  normalization  con¬ 
straints  on  the  set  of  variables  and  exploiting  the  bilinearity  of  the  oper¬ 

ator  (•,  •) p  the  objective  function  can  be  written  as 

\\m2F+l\\i\\2F-Yp^°^fl)F 

(9) 

=  constant  —  (O,  f- 

j  £J 
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Defining  the  superposition  of  transformations  as  Pe  =  E  Pi9i,  dropping  the 

i—1 

constant  term,  and  using  min—  z  =  max z  yields  the  model 

L 

max  {O,  TT  Pll)'2F  (10) 

t=i 

subject  to  the  same  linear  constraints  in  ((6)).  This  model  is  called  the  Max¬ 
imal  Correspondence  Model  and  is  the  basis  of  the  Map  Seeking  Algorithm8. 
In  their  models  they  also  discretize  space  rather  than  using  barycentric  coordi¬ 
nates.9  Hence  the  minimuml  error  and  maximal  correspondence  formulations 
are  equivalent. 

0.1.5  An  Optimization  Algorithm  for  the  Multilinear  Model 

Using  the  notation  of  section  0.1.1  let  p  =  ®f=1pf  be  the  direct  sum  of  the 
L  probability  distributions  pf  =  (p\ , . . . ,  pi  )  .  For  the  2D  matching  problem 
above,  one  may  have  p1  =  {pf , . . .  ,p%f  },  the  distribution  associated  with  the 
optimal  ^-translation.  Starting  with  an  initial  uniform  distribution  for  each  p 1 , 
the  decoupled  constraints  permit  the  updating  of  the  probability  distributions 
associated  with  each  unknown  parameter  in  parallel  which,  depending  upon 
the  number  of  processors  used,  is  considerably  more  efficient  than  a  sequential 
procedure.  Using  projected  gradients  the  search  direction  can  be  computed  as 
d  =  ®^=idf  and  each  component  of  the  direct  sum  ®^=1  pe  can  be  simultane¬ 
ously  updated  via  a  step  p1  =  p(  +  adf .  A  line  search  procedure  can  be  used  to 
calculate  a  step  length  a  or,  upon  normalization  of  the  search  directions  df  in 
each  subspace,  a  constant  step  length  a  at  each  iteration  can  be  employed.  Our 
experience  suggest  the  constant  step  length  is  just  as  effective  as  a  line  search 
procedure  and  computationally  more  efficient. 

®David  Arathorn.  Map  Seeking  Circuits  in  Visual  Cognition:  A  Computational  Mecha¬ 
nism  for  Biological  and  Machine  Vision.  Stanford  Press,  2002. 

9S.R.  Harker,  C.R.  Vogel,  T.  Gedeon.  Analysis  of  constrained  optimization  variants  of 

the  Map  Seeking  Circuit  Algorithm,  Journal  of  Mathematical  Imaging  and  Vision ,  29  (2007), 
pp. 49-62. 
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The  alternative  choice  to  a  parallel  implementation  is  a  sequential  proce¬ 
dure  updating  each  distribution  in  turn,  and  using  the  updated  distributions 
pe  and  the  corresponding  updated  superposition  Pt  =  to  compute  the 

i—  1 

projected  gradients  of  the  next  distribution  pf+1.  By  exploiting  adjoints 

(O,  PL  ■  ■  ■  Pe  ■  ■  ■  Pl  1)  =  (( Pe)T  ■  ■  •  ( Pl)tO ,  Pe~x  ■  ■  ■  P 1T)  (11) 

the  updating  procedure  can  be  made  efficient  in  the  maximal  correspondense 
model  (15). 

0.1.6  Results 

Using  the  model  developed  above  and  projected  gradients,  figure  1  shows  how 
each  image  point  corresponds  bijectively  to  a  single  point  on  the  CAD  model 
using  this  type  of  model. 

■  Weighted  Image  Points 

■  Correspondence  Points 

d(lmage.Tank)=  1.14797 


Figure  1:  Correspondence  between  image  points  and  correspondence  points  on 
the  CAD  model. 


Given  a  whole  database  of  objects  one  can  sequentially  apply  the  algorithm 
to  each  prototype  (generic  representative  of  a  class  of  objects)  and  obtain  results 
such  as  in  figure  2  where  the  distances  between  the  various  objects  in  the 
database  can  be  ordered.  An  alternative  is  to  add  another  variable  probability 
distribution  which  optimizes  over  all  the  objects  simultaneously,  {pjf}- 
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•  Weighted  linage  Points 
m  Correspondence  Points 


•  Weighted  Image  Points 
■  Correspondence  Points 


•  Weighted  Image  Points 
m  Coirespondence  Points 


d<Image.modell)=2.09ll3  d(Image.mode!2)=2J7132  d(Image.bmp)=  1.69437 


Figure  2:  Metric  distances  between  a  point  cloud  of  data  and  several  objects. 


More  results  are  shown  in  the  following  pages  where  the  complexity  of  the 
models  is  detailed  in  the  CAD  models  shown.  It  is  computationally  critical 
to  the  solution  algorithm  that  the  CAD  model  consist  of  as  few  surfaces  as 
possible.  The  existing  CAD  model  generators  are  incredibly  inefficient  in  this 
regard;  they  generate  models  consisting  of  triangles  so  a  simple  square  is  always 
represented  by  two  surfaces  rather  than  a  single  surface  with  4  vertices.  We  have 
constructed  a  “glue  model”  which  attempts  to  glue  such  surfaces  together  based 
upon  a  common  edge  and  the  surface  normals  being  identical.  Unfortunately 
many  of  the  CAD  model  generators  generate  models  such  that  the  two  surfaces 
which  should  have  identical  surface  normals  do  not.  We  have  attempted  to 
correct  this  but  it  is  not  always  possible  (algorithmically)  since  the  data  can 
sometimes  vary  significantly. 
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Figure  3:  Some  CAD  models  employed  for  testing  (models  1  and  2). 
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Figure  4:  CAD  Model  3  and  4. 
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The  image  data  for  Test  1,  generated  synthetically,  is  shown  in  Figure  6 
while  the  results  of  employing  this  test  data  are  shown  in  Figure  7.  Here  the 
image  data  is  such  that  the  objects  can  be  clearly  distinguished  (separated). 
For  some  of  the  image  data  that  follows  this  is  not  the  case. 


The  image  data  for  Test  2  is  shown  in  Figure  8  (These  results  are  better 
viewed  in  Mathematica  where  the  images  can  be  rotated,  scaled,  etc.) 
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•  Weighted  Image  Points  ■  Weighted  Image  Points  •  Weighted  Image  Points  ■  Weighted  Image  Points  ■  Weighted  Image  ftjinu 

I  Correspondence  Points  ■  Correspondence  Points  I  Correspondence  Poinu;  jcoirespondence  Points  ■  Correspondence  ftiints 

d(Inugejnodell)=2.091l  j  d(lraagejnodel2)=2J7132  d(Image.bmp)=l.69437  d(im,ge.t72)=2.68l77  d(lmajejn60(=2.4788 


Figure  7:  Results  with  image  data  set  1. 


Figure  8:  Image  data  for  Test  2. 


I  Weighted  Image  Points 
I  Correspondence  Points 

d(lmage.modell)=2.03734 


I  Weighted  Image  Points 
■  Correspondence  Points 

ddmage  .mode  121=2.06326 


•  Weighted  Image  hints 

•  Correspondence  Ftiints 

d(taage.hmp)=2. 14572 


■  Weighted  Image  Points 

■  Cotresponde nee  Points 

d(lmage.t72)=2.983SI 


I  Weighted  Image  hints 
I  Correspondence  hints 

d(taagejn60)=228844 


Figure  9:  Results  with  image  data  set  2. 


(•  Imago  data  tast3  •) 


■  Weighted  Image  Points 

■  Correspondence  Points 

d(lmage4nodcl2)=2. 19489 


■  We  ighted  Image  Points 

■  Correspondence  Points 

{ddmage  jnodell  )=3X)7058 

% 


■  Weighted  Image  Points  ■  Weighted  Image  Points  a  Weighted  Image  Points 

■  Correspondence  Points  a  Correspondence  Points  a  Correspondence  PoinLs 


'  d(Image.bmp)=2.95517  '  d(Image.t72)=3.093l2  '  d(Image.m60)=2.82l67 


%  % 


* 


Figure  10:  Results  with  image  data  set  3. 


With  image  data  set  3  there  are  not  enough  points  on  critical  components 
to  distinquish  between  the  first  3  models  and  the  results  give  the  wrong  target. 
We  can  constrast  this  with  data  set  1  where  the  viewpoint  was  more  favorable 
to  be  able  to  separate  the  models  and  give  the  correct  result. 
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d{  Image.bmp)  =  2  -  I  4572 


Figure  11:  Results  on  data  from  bmp  (correct  target)  with  image  data  set  3. 

(•  image  data  teati  Results  *) 

B  Weighted  Image  Points 
B  Coirespondence  Points 

d(Image  jnodel2)=  1 .82928 


B  Weighted  Image  Points 
B  Correspondence  Points 


{d(lmagejnodell  (=3.69642  ' 

% 


B  Weighted  Image  Points 
B  Correspondence  Points 

d(Image.bmp)=3. 12555 

% 


B  Weighted  Image  Points 
■  Correspondence  PoinLs 

d(lmage.t72)=3.l2185 


B  Weighted  Image  PoinLs 
B  Correspondence  Points 

d(Imagejn60)=2.86812  } 

% 


Figure  12:  Results  with  image  data  set  4. 

Matching  points  to  surfaces  can  be  modified  by  extracting  planes  of  image 
points  in  the  point  cloud  data.  Then  by  using  generalized  principle  component 
analysis,  discussed  in  section  0.1.7,  one  can  match  the  extracted  surfaces  to  the 
CAD  model  surfaces.  If  one  also  can  obtain  a  good  estimate  of  where  the  ground 
plane  is  then  the  correspondence  variables  can  be  initialized  accordingly  and  an 
initial  “total  ignorance”  distribution  of  a  uniform  distribution  is  not  necessary. 
Even  better  estimates  can  be  obtained  by  taking  into  account  the  size  of  the 
various  surfaces  so  that  if  surface  1  is  twice  as  large  as  surface  2  then  for  an 
arbitrary  image  point  one  would  expect  a  higher  probability  that  the  image  point 
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came  from  the  second  surface.  This  probabilistic  interpretation  (perspective)  of 
the  problem  provides  guidance  in  how  to  initialize  all  the  unknown  variables. 

0.1.7  Generalized  PCA 

The  technique  of  generalized  principal  component  analysis  allows  one  to  extract 
hyperplanes  from  point  clouds.  Each  one  of  these  hyperplanes  has  a  surface 
normal  so  that  if  one  can  estimate  the  ground  plane  then  the  surface  normals  of 
the  extracted  hyperplanes  can  be  compared  against  the  surface  normals  of  the 
surfaces  of  the  CAD  model  to  come  up  with  prior  probabilities  for  the  assignment 
correspondence  probabilities.  In  Figure  13  a  point  cloud  consisting  of  two 
clusters  of  points,  each  corresponding  to  a  distinct  hyperplane,  are  shown.  We 
can  associate  a  (normalize)  surface  normal,  defined  up  to  a  sign,  with  each  one  of 
these  hyperplanes.  Call  these  surface  normals  soranffe  and  SuUe ■  Given  a  single 
surface  with  a  known  (normalized)  surface  normal  s0bjectPart  the  correspondence 
between  the  given  surface  can  be  determined.  A  simple  expression  is  given 
by  the  absolute  value  of  the  inner  product  between  the  hyperplane  normals 
and  the  surfaces  normals,  \(s0rangei  SobjectPart)  |  and  blue ?  SobjectPart\)  and  the 
resulting  associated  probabilities  based  upon  this  are  shown  in  Figure  14. 


0.1.8  Image  Point  Spread  Model 

In  the  Image  Point  Spread  Model  the  lack  of  knowledge  of  the  precise  location 
of  each  image  point  can  be  incorporated  directly  into  the  optimization  model  of 
the  problem.  This  weighted  least  squares  optimization  model  is 


21 


Figure  13:  A  point  cloud  consisting  of  two  distinct  hyperplanes. 


mm 
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i=  1  m=  1 
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||RM  +  (T  +  ATt)-S 


^ imk  % 


k 

m 


1 12 
IIf 


subject  to 


\ATt\  <  6i 

(12) 

where  Nj  is  the  number  of  image  points,  Ns  is  the  number  of  “surfaces”  in  the 
CAD  Model,  Ny^m)  is  the  number  of  vertices  of  the  mth  surface  of  the  CAD 
model,  and  S  is  a  parameter  expressing  our  uncertainty  in  the  knowledge  of  the 
image  points.  While  this  parameter  can  vary  for  each  individual  image  point 
our  present  implementation  just  assumes  a  constant  value  over  all  image  points. 
Note  that  this  model  is  ideal  for  a  parallel  implementation  with  respect  to 
optimization  over  the  individual  point  spreads  A Xj  since  they  are  all  decoupled 
constraints. 

We  have  added  this  point  spread  capability  to  our  working  algorithm  which 
coded  up  in  Mathematica.  We  have  applied  this  to  our  test  model  shown  in 
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P[0range,Green]=.7 

P[0range,Blue]=.3 


P[Red,Green]=.l 

P[Red,Blue]=.9 


Figure  14:  Computing  prior  probabilities  based  upon  the  surface  normals  of  the 
extracted  hyperplanes. 

Figure  15. 

Figure  16  shows  the  results  which  are  exactly  what  we  expected  from  this 
model  -  it  decreases  the  weighted  least  squares  error  as  the  slack  Si  is  increased. 
The  x-axis  in  this  figure  represents  the  value  Si  and  the  y-axis  shows  the  RMS 
error  for  this  value. 

The  RMS  error  does  not  go  to  zero  because  of  the  discretization  involved  in 
the  given  model.  By  refining  the  discretization  it  is  possible  to  get  the  error  to 
converge  to  zero. 

One  must  have  relatively  small  errors  in  the  signal  else  everything  starts  to 
look  the  same,  i.e. ,  any  image  could  have  been  produced  by  any  object.  Here  is 
the  same  image  data  applied  to  a  decoy. 
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^  Weighted  Image  Points 
^  Correspondence  Points 

d(Image,Tank) =1.1 4797 


Figure  15:  Standard  Test  Case 

RMS  Error 


Figure  16:  Image  Point  Spread  Results 


0.2  2D-3D 

In  the  2D-3D  matching  problem  we  only  have  2  dimensional  information  avail¬ 
able  about  the  object  in  question.  In  this  case  the  3D-3D  model  can  be  modified 
so  that  the  third  unknown  coordinate  is  treated  as  an  unknown  parameter  which 
can  be  optimized  over.  To  implement  such  an  approach  it  is  necessary  to  extract 
lines  in  the  2D  data,  say  by  a  canny  edge  detection  algorithm,  and  to  match 
that  up  against  a  wire  frame  model  of  the  object. 

An  initial  manual  registration  of  the  edge  detected  image  data  to  the  wire 
frame  model  is  employed.  We  start  with  the  2D  image  (so  the  3rd  coordinate  z 
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^  Weighted  Image  Points 
^  Correspondence  Points 

d(Image,Car)  =3 .7 1 438 


Figure  17:  Car  Decoy  Matching  Correspondence  -5  =  0 

RMS  Error 


Figure  18:  Image  Point  Spread  Results  for  Tank  and  the  Car  Decoy 
is  some  constant  for  all  points).  Figure  19  shows  the  arbitrary  initial  placement. 


Figure  20  shows  a  manual  registration  obtained  by  translating  the  image 
and  rotating  the  wire  frame  model. 


This  initial  registration  need  not  be  ’’exact”  just  close  so  as  to  avoid  issues 
of  global  optimality  concern.  By  this  initial  registration  the  code  can  handle 
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Figure  19:  Here  is  the  image  and  wire  frame  object  -  need  an  initial  manual 
registration. 


Figure  20:  An  initial  alignment. 


relatively  large  CAD  models  because  the  initial  probabilities  can  be  set  accord¬ 
ingly. 

The  optimization  model  for  this  problem  is 


min 

subject  to 


E  i  E  j  Pi, j  1 1  (Iui  +  T)-tt  J2k  R(ai,j,kXj)  ||2 


E  jPi,j  =  1  V* 

Pi,j>  0  Vi,j 

Efc  ai,j,k  =  1 

oti,j,k>  0  k 


(13) 


where  7r  is  the  projection  mapping  onto  2D  (wolog,  say  the  first  two  coordi¬ 
nates).  Now  the  object,  rather  than  the  image,  is  rotated  and  the  transforma¬ 
tion  of  the  image  involves  an  inversion  I  to  account  for  parallax  -  this  of  course 
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depends  upon  the  model  -  which  occurs  with  most  cameras. 

This  initial  registration  gives  me  a  good  starting  point  for  the  optimization 
problem  -  that  being  to  find  the  optimal  general  linear  projective10  transfor¬ 
mation  of  the  image  data  and  optimal  3D  rotation  of  the  wire  frame  model  to 
minimize  the  total  error.  The  result  is  shown  in  Figure  3  which  gives  the  image 
data  (blue  pts)  and  corresponding  points  (red  points)  on  the  wire  frame  model 


Figure  21:  Here  is  after  the  optimization  routine  improves  the  initial  registration 
(alignment) . 


By  using  finer  discretizations  I  can  obtain  more  accurate  results  -  as  ex¬ 
pected.  This  refinement  problem  will  need  some  analysis. 

This  result  allows  me  to  then  properly  place  the  damaged  image  data  onto 
the  3D  CAD  model 


The  optimization  routine  then  calculates  the  corresponding  points  on  the 
3D  CAD  model.  Here’s  the  optimal  correspondence 


1  °The  scale  is  limited  to  a  small  range  about  1  so  the  metric  still  exist  (is  nonzero). 
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Figure  22:  Here  is  the  registration  of  the  damaged  data  points  onto  the  3D  CAD 
Model. 


Figure  23:  Here  is  the  correspondence  between  the  damaged  data  points  and 
correspondence  points  on  the  3D  CAD  Model. 

These  red  correspondence  points  “are  the  solution”.  Note  that  the  blue 
image  points  don’t  lie  exactly  on  the  surface  -  this  is  because  the  ”z”  coordinate 
is  unknown.  A  good  initial  alignment  is  desirable  to  get  the  image  points  to 
lie  as  close  as  possible  to  the  surface  but  is  not  required.  Indeed  for  a  curved 
surface  some  of  the  image  points  will  necessarily  lie  off  of  the  surface.  You  can 
see  this  in  the  above  image  on  the  front  curved  surface.  The  picture  below 
illustrates  this. 
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Figure  24:  Another  view  of  the  correspondence  between  the  damaged  data 
points  and  correspondence  points  on  the  3D  CAD  Model. 


0.3  Rubiks  Cube 


Rubik’s  Cube  can  be  represented  by  specifying  a  state  of  the  cube  by  specify¬ 
ing  the  location  of  20  cubes  and  the  orientation  of  those  cubes.  This  can  be 
accomplished  with  a  (0,  l)-matrix  representation.  For  Rubik’s  Cube  the  trans¬ 
formation  group  is  the  set  of  all  possible  permutations  of  the  cube  which  is 
a  discrete  group.  This  transformation  group  Q  itself  can  be  decomposed  into 
‘layers’  and  the  rotations  about  each  of  the  six  faces,  g  =  Pj  t  o  ...  o  Pj  t  o  Pj  t. 

Let  Ql  be  the  group  of  permutations  on  the  cube,  for  each  layer  l  of  rotations, 
yielding  the  model 


L  6  3 


mm 


/= 1  /=i  t= o 


subject  to 

6  3 

for  all  layers  l 

f= 1 1= o 

plft  >  0  for  all  faces  /,  layers  l,  and  turns  t 


(14) 


where  L  is  the  number  of  layers  of  rotations.  Upon  further  simplification  the 

L  6  3  L 

objective  function  can  be  written  as  min|  n  ee  p/, tii°-  n^ii3-11 

Pf’t  1=1  f= 1  t= 0  m= 1 

11  For  computational  purposes  this  is  not  the  most  efficient  model  since  for  t  =  0  turns  about 
each  face,  Pj  t  is  the  identity  map.  A  single  identity  map  suffices  for  each  layer  yielding  19 
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Using  the  fact  ||x|||  =  (x,x)  and  the  normalization  constraint  on  the  set  of 
variables  {plft}f,t,  upon  expansion  of  the  squared  objective  function  and  drop¬ 
ping  constant  terms  in  the  expression  to  be  minimized  we  obtain  the  maximal 
correspondence  model  with  objective  function 

L  6  3  L 

max  II  Pf-t1)  (15) 

pf-t  1=1  f=l  t=0  m= 1 

subject  to  the  same  linear  constraints  as  above.  Hence,  as  stated  above  for  the 
general  least  squares  formulation,  the  minimuml  error  and  maximal  correspon¬ 
dence  formulations  are  equivalent  Our  algorithmic  results  for  Rubiks  Cube  are 
identical  to  those  presented  in  Analysis  of  Constrained  Variants  of  the  Map- 
Seeking  Algorithm12 .  For  comparative  purposes  it  should  be  noted  that  in  that 
paper  they  define  a  ‘layer’  to  consist  of  at  most  a  single  90  degree  counter 
clock  wise  (ccw)  rotation  about  any  face  (yielding  7  possible  mappings  at  each 
layer)  whereas  we  define  a  layer  to  be  any  rotation  about  any  of  the  faces  (19 
possible  mappings)  and  the  term  “multilinear  model”  will  refer  to  either  form. 
This  multilinear  model  should  not  to  be  implemented  directly  as  it  is  written. 
The  bilinearity  of  the  inner  product  allows  the  summation  terms  to  be  brought 
inside  the  brackets  and  permits  the  formation  of  the  “superposition  of  transfor¬ 
mations”  at  layer  l  by  Pl  =  plj  tPlf  t  to  yield  the  formulation  with  objective 


function 


L 


max  (O,  TT  Pml)  (16) 

m= i 

which  greatly  reduces  the  computational  effort  required  to  solve  the  problem. 
The  formulation  of  recognition  problems  directly  using  this  model  is  what  the 
Map-Seeking  Algorithm  is  based  upon.  By  using  a  weighted  listing,  with  real 
coefficients  restricted  to  [0,1],  of  all  possible  combinations  embeds  the  problem 


unique  possible  mappings  for  each  layer  considered  rather  than  the  24  =  6  X  4  possibilities 
in  the  given  model.  We  have  chosen  the  above  model  as  it  provides  an  elegant  and  simple 
formulation  for  exposition. 

12Harker,S.,  Vogel  C.,  Gedeon  T.,  Journal  of  Mathematical  Imaging  and  Vision.  Volume 
29  Issue  1,  pp  49-62. 
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into  the  real  domain  for  computational  advantage  and  allows  for  the  probabilis¬ 
tic  interpretation13  which  allows  for  further  exploitation14.  At  a  foundational 
level  this  model  is  the  multilinear  least  squares  error  formulation  (14).  In  the 
3D  recognition  problem  the  superposition  of  transformations  involves  both  su¬ 
perpositions  of  rotations  and  superpositions  of  translations.  The  exploitation 
of  the  bilinearity  changes  the  complexity  of  the  problem  from  multiplicative  to 
additive. 

0.3.1  The  Multilinear  Model  Optimization  Updating  Step 

The  decoupled  linear  constraints  associated  with  the  model  allows  one  to  up¬ 
date  the  probabilities  associated  with  each  layer  in  parallel  which  is  considerably 
more  efficient  than  a  sequential  procedure.  Using  a  projected  gradient  technique 
to  calculate  a  search  direction  dl  at  each  layer  the  19  probabilities  associated 
with  the  19  distinct  rotational  mappings  of  the  cube  at  each  layer  can  be  simul¬ 
taneously  updated  via  a  step  pl  =  pl  +  ad1.  One  can  either  use  a  line  search 
procedure  to  calculate  the  step  length  a  or,  upon  normalization  of  the  search 
direction  d,  use  a  constant  step  length  a  at  each  iteration.  Our  experience  sug¬ 
gest  the  constant  step  length  is  computationally  just  as  efficient  as  a  line  search 
procedure. 

The  alternative  choice  to  a  parallel  implementation  is  a  sequential  procedure 
updating  each  layer  in  turn,  and  using  the  updated  probabilities  at  layer  l,  Pl  in 
the  calculation  of  the  projected  gradients  at  layer  l  +  1.  By  exploiting  adjoints 

(O,  PL  ■  ■  ■  Pl  ■  ■  ■  P1!)  =  ((Pl)T  ■  ■  ■  ( Pl)tO ,  P1-1  ■  ■  ■  Pll)  (17) 

the  updating  procedure  at  layer  l  can  be  made  efficient. 

Here  is  a  sequence  of  distributions  showing  the  convergence  to  dirac  measures 
for  a  6-layer  problem.  In  the  software  this  information  is  animated.  The  first 
image  of  the  sequence  shows  a  uniform  distribution  for  all  6  layers.  As  the 

13Note  that  a  solution  can  always  be  found  at  a  dirac  probability  measure. 

14These  additional  exploitations  based  upon  the  probabilistic  interpretation  will  be  dis¬ 
cussed  in  a  future  paper 
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optimization  algorithm  proceeds  all  the  distributions  approach  a  dirac  measure 
fairly  uniformly  since  they  are  all  being  updated  simultaneously.  (That  code 
was  run  in  parallel.) 

0.3.2  Rubiks  Cube  Results 

The  parallel  and  sequential  implementation  using  projected  gradients  for  the 
search  direction  produced  the  same  success  rate  in  the  determination  of  the 
global  minimum.  This  is  worthy  of  note  since  the  two  procedures  actually 
approach  the  solution  along  a  potentially  different  trajectory. 

These  results  are  shown  in  the  bottom  graph  of  Figure  1  as  a  function  of  a 
varying  number  of  layers.  For  an  n  layer  problem  we  generated  a  random  initial 
mixup  of  the  cube  consisting  of  n  random  rotations  of  the  cube.  Each  random 
rotation  was  one  of  the  3  distinct  nontrivial  rotations  about  a  given  (random) 
face. 


Success  Rale 


Success  Race  for  Conjugate  and  Rotational  Transformations 


1 

i  \ 

■ 

Figure  25:  Success  rate  of  the  algorithm  for  the  two  methods  involving  search 
directions  of  simple  permutations  and  the  other  using  conjugates  for  the  search 
directions. 
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0.3.3  Conjugates  and  Local  minimum 


In  analyzing  those  problems  where  the  optimization  algorithm  failed  to  find  the 
global  minimum  we  found  a  striking  number  of  those  cases  occurred  when  the 
initial  mixup  of  the  cube  involved  a  conjugation.  This  phenomena  is  illustrated 
in  Figures  29  which  shows  the  green  face  rotated  3  times  counterclockwise  (270 
degrees)  followed  by  a  rotation  of  the  white  face  by  two  turns,  and  finally  a 
rotation  of  the  green  face  by  1  counterclockwise  rotation. 


Figure  26:  The  cube  on  the  left  shows  an  initial  conjugation  of  the  Green  and 
White  faces.  The  cube  on  the  right  shows  the  local  minimum  obtained  by  using 
elemenatary  rotations. 


Such  a  rotation  can  be  denoted  by  ghg^1  with  g  being  the  group  element 
“rotate  the  green  face  3  times”.  The  inverse  of  this  element,  g -1,  is  “rotate 
the  green  face  once”  since  four  rotations  about  any  face  gives  back  the  identity 
map  (no  rotation  at  all).  Associated  with  these  conjugations  are  local  minimum 
which  have  a  very  large  basin  of  attraction  so  unless  one  starts  very  close  to  the 
global  optima  one  gets  trapped  in  a  local  minina  using  projected  gradients  as  a 
search  direction.  This  phenomena  extends  to  any  size  problem.  If  a  conjugation 
occurs  anywhere  within  the  initial  mixup  of  the  cube,  say  the  transformation 
applied  to  the  cube  is  gih\h2h^,g^1  (=  ghg~x)  then  a  local  minimum  will  exist 
with  a  large  basin  of  attraction.  The  noncommutativity  of  the  transformation 
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group  causes  difficulty  as  using  projected  gradient  will  simply  yield  the  solution 
of  h^1  rather  than  gh~1g~1.  To  the  optimization  algorithm  the  composite 
transformation  ghg~l  ss  gg~1h  =  h.  An  analogy  in  ID  would  be  a  function  as 
shown  in  Figure  3  where  a  gradient  based  algorithm  will  inevitably  tend  to  the 
local  minimum. 


f 


Figure  27:  A  function  with  a  local  minimum  having  a  large  basin  of  attraction. 


One  can  improve  the  performance  of  the  algorithm  by  directly  employing 
conjugations  in  the  model  itself.  In  the  objective  function  (29)  the  terms  Pl 
are  superpositions  of  rotations  about  the  six  faces  of  the  cube.  This  objective 
function  can  be  altered  by  letting  each  superposition  Pl  be  a  superposition  of 
conjugations.  This  change  in  the  model  simply  says  one  can  obtain  a  solution 
using  conjugations,  which  includes  all  the  simple  rotations  since  for  g  and  h  being 
rotations  on  opposite  face  ghg~x  =  h,  and  is  such  that  a  projected  gradients 
algorithm  applied  to  it  can  avoid  more  local  minimum  than  the  simple  model. 

Because  the  number  of  possible  permutations  is  increased  from  19  distinct 
possibilities  to  235  distinct  possibilities  one  may  expect  the  solution  to  be  de¬ 
termined  by  searching  over  a  larger  space  to  yield  a  better  result.  Using  conju¬ 
gations  as  search  directions  this  is  true.  However  there  are  many  other  choices, 
such  as  choosing  commutators  ghg^1} i_1  which  when  used  in  conjunction  with 
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conjugations  yield  485  distinct  permutations,  and  using  this  larger  set  yields 
poorer  results  than  when  using  only  conjugations.  Simply  enlarging  the  num¬ 
ber  of  possible  permutations  does  not  guarantee  the  algorithm  will  find  a  better 
solution15 

0.3.4  Conjugate  Search  Results 

Using  the  above  multilinear  model  generating  random  initial  images,  based  on 
200  random  samples  and  an  initial  uniform  distribution,  we  obtained  the  follow¬ 
ing  success  rates  for  varying  number  of  layers  as  displayed  in  the  top  graph  of 
Figure  1.  In  Figure  4  the  computation  times  show  how  the  computation  times 
increase  roughly  linearly  with  respect  to  the  number  of  layers.  The  computa¬ 
tional  times  are  included  to  show  relative  computation  times  -  the  code  has  not 
optimized  for  speed. 


Figure  28:  A  function  with  a  local  minimum  having  a  large  basin  of  attraction. 


15In  the  extreme  case  one  can  take  the  superposition  to  be  a  linear  combination  of  every 
discrete  possibility  so  that  only  one  layer  is  required.  In  this  case  the  problem  reduces  to 
a  linear  programming  problem  which  is  convex  so  the  global  optima  can  always  be  deter¬ 
mined.  However  this  is  not  computationally  feasible  since  the  number  of  such  permutations 
is  exceedingly  large. 
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0.4  DC  Programming  Model 

One  of  the  challenges  in  attempting  to  solve  any  object  recognition  problem  us¬ 
ing  optimization  resides  in  the  difficulty  of  determining  a  global  optimum  to  the 
model  one  employs  for  modeling  the  problem.  If  the  algorithm  used  to  solve  the 
optimization  model  determines  a  local  minimum  such  a  formulation  is  useless. 
Since  there  is  no  optimization  algorithm  which  guarantees  a  global  minimum 
for  all  models  it  is  desirable  to  choose  the  optimization  model  such  that  it  is 
a  convex  model  (a  convex  objective  function  with  convex  constraints)  whereby 
it  is  possible  to  guarantee  a  global  minimum  since  for  a  convex  optimization 
model  any  local  minimum  is  unique  and  hence  a  global  minimum16. 

In  recent  years  there  has  been  an  increasing  awareness  that  convex  and 
quasi-convex  optimization  models  can  play  an  important  rule  in  recognition 
problems.  To  our  knowledge  the  general  recognition  problem,  which  does  not 
assume  a  known  correspondence  or  knowledge  of  the  transformation  mapping, 
has  not  been  addressed  with  convexity  (global  optimality  issues)  in  mind.  The 
Map  Seeking  Circuit  (MSC)  algorithm  of  Arathorn17  has  achieved  some  success 
in  addressing  the  global  optimality  problem  and  uses  the  concept  of  superposi¬ 
tion  to  obtain  a  computationally  feasible  model  for  recognition  problems.  This 
concept  follows  directly  from  the  discretization  of  the  problem  which  gives  a 
multilinear  programming  problem  and  the  bilinearity  of  the  inner  product  used 
for  the  objective  function.  Our  modeling  procedure  does  not  require  discretiza¬ 
tion  of  space,  addressing  the  correspondence  problem  as  an  assignment  problem, 
and  uses  a  minimum  error  model  followed  by  a  transformation  of  variables.  We 
show  the  relationship  between  the  maximal  correspondence  model18  employed 
by  MSC  and  the  minimum  error  model.  In  our  models  the  unknown  variables 
can  be  interpreted  as  probabilities  which  allows  for  additional  techniques  to  be 

1(>The  uniqueness  is  in  terms  of  the  objective  function  value;  the  argument  of  the  minimum 
is  not  necessarily  unique. 

1 '  Map  Seeking  Circuits  in  Visual  Cognition:  A  Computational  Mechanism  for  Biological 
and  Machine  Vision.  Stanford  Press,  2002. 

18The  maximal  correspondence  formulation  is  a  multilinear  programming  problem  which  is 
non  convex. 
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brought  to  bear  on  the  problem. 

When  a  convex  optimization  model  is  not  available  a  good  alternative  is  to 
construct  the  model  using  a  difference  of  convex  (DC)  functions. 

Towards  this  end  we  formulate  shape  based  object  recognition  problems  us¬ 
ing  an  optimization  model  with  a  DC  objective  function  and  convex  constraints. 
Such  a  procedure  provides  a  theoretical  foundation  for  further  research. 

This  paper  is  organized  into  two  sections.  In  Section  1  basic  modeling  of 
shape  based  recognition  problems  is  addressed  using  discretization  of  the  Eu¬ 
clidean  transformation  variables.  This  modeling  yields  a  multilinear  program¬ 
ming  problem  which  can  be  transformed  into  a  DC  optimization  problem  using 
the  exponential  transformation  of  variables. 

Section  2  consist  of  applications  including  Rubik’s  Cube,  point-to-point 
matching  problems,  and  the  generalization  to  point  to  surface  matching.  These 
applications  illustrate  the  implementation  of  the  modeling  procedure  and  high¬ 
light  computational  considerations  and  aspects.  The  modeling  of  the  feature 
correspondence  problem  as  an  assignment  problem19  is  implemented  and  ex¬ 
plained  on  a  point  to  point  matching  problem. 

0.4.1  Discretizing  a  Model 

Any  continuous  optimization  problem  can  be  discretized  over  the  variables  to 
obtain  a  “probabilistic  model”. 

Consider  the  two  variable  problem 

min  f(x,y)  (18) 

x,y 

where  /  is  any  continuous  function  in  both  arguments. 

Discretizing  the  variable  x  into  Nx  possible  values  {x±,  £2,  •  •  -  ->xnx}  and  the 
variable  y  into  Ny  values  {2/1 , 2/2 ?  •  •  •  ,VNy}  problem  (18)  can  be  approximated 
as 

19The  assignment  problem  is  well  known  in  the  Linear  Programming  community  and  can 
be  found  in  most  book  on  linear  programming. 


37 


Nx  Ny 

min  /L  'y)pxiPyj  /  (x, ,  Vj ) 

,Pyj  ^  ' 

J  »=1  3  =  1 

subject  to 
= 1 

i=l 

Ny 

Y^Pvi  = 1 
j=i 

>  o  for  <  =  1, . . . ,  Nx 
Pvi  >0  for  j  =  1, . . . ,  Ny 


(19) 


where  the  vectors  Px  =  {pXl  ,pX2,...,  pXNx  }  and  Py  =  {pyi  ,pV2,...,  pVNy  } 
can  be  interpretted  as  probability  distributions.  The  global  solution  to  problem 
(19)  will  occur  at  dirac  probability  measures  Px  =  5 j  and 

Py  =  5j  where  /  achieves  a  minimuml  value  at  the  values  x i  and  y-  among  all 
the  discretized  values.  This  formulation  is  simply  a  crude  method  for  looking  at 
all  possible  (discretized)  values  and  choosing  the  minimum  value.  This  idea  can 
be  applied  to  any  unconstrained  or  decoupled  constrained  optimization  problem. 
It  turns  an  n  —  variable  problem  into  an  n  — linear  programming  problem  which 
is  linear  in  each  probability  P. .  The  formulation  (19)  is  not  computationally 
feasible  for  a  problem  with  a  large  number  of  probabilities  and/or  discretizations 
unless  the  objective  function  /  has  the  property  of  bilinearity  possessed  by  inner 
products.  In  such  a  case  the  summation  terms  can  be  brought  inside  the  inner 
products  thereby  reducing  the  computational  complexity  from  multiplicative  in 
the  number  of  variables,  NxNy,  to  additive,  Nx  +  Ny. 

In  modeling  rigid  body  motion  the  minimum  norm  error  squared  model 
defined  by  the  Euclidean-invariant  function 


d(Ob,  Im)  =  min  \\Ob  —  (R(Im) +  T)\\2  (20) 

R,T 

where  R  is  a  an  orthogonal  rotation  and  T  is  a  translation,  is  an  appropriate 
model  and  determines  a  pseudo  metric  distance  between  an  object  and  an  image. 
Discretizing  this  problem  (2D  for  illustrative  purposes) 
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Nr  Nx  Nv 

d(Ob,  Im)  =  min  PRjPxkPVl\\Ob  -  ( Rj(Im )  +  Tki) 


PRj  ,Pxk  ,PVl 


j= 1 k—1  1= 1 


=  min  (06,06)  -  2(06,  Tl^=iPRjRj(Im)  +  Efc=i  ES  P*Atm)+ 

pRj,p.k,pyi  J 

2(E^2 \PRjRj{Im),  Ef=  1  Yd=iPxkPvlTk,i)  +  T,k= i  T,i^iPxkPyi(Tk,i,Tkii)  +  (Im,  Im) 

(21) 

where  Rj,  xk,  and  yi  are  the  discretized  rotation  and  translation  values.  This 
can  be  further  simplified  since  7] j.  =  {;rfc,2/z}. 
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0.5  The  DC  Programming  Formulation 

The  multilinear  programming  model  formulation  of  the  problem  as  illustrated 
in  (19)  is  still  inherently  difficult  to  solve.  The  higher  the  multilinearity  the 
more  difficult  the  problem  is  to  solve  using  traditional  algorithms.  However  this 
multilinear  programming  problem,  which  is  nonconvex,  can  be  transformed  into 
a  difference  of  convex  functions  as  follows: 

Nx 

Note  that  the  equality  constraint  in  the  multilinear  model,  E^-  =  1,  is 

i—1 

equivalent  to  | Px \ | ]  =  1  since  each  px.  >  0.  (Any  lp  norm,  p  >  1  can  be  used  in 
the  model.)  This  is  a  normalization  condition  which  is  neccessary  to  avoid  the 
trivial  solution  Px  =  Py  =  0. 

This  condition  on  the  variables  p.  can  be  achieved  by  scaling  each  such 
variable  by  its’  norm  so  the  multilinear  model  above  is  equivalent  to 


jV„ 


mm 

P^i  ,Py. 


AT* 

EEi 

,=1  j= i 


Py , 


-  fixuVi) 


subject  to  (22) 

Pxi  >  0  for  i  =  1, ...  ,NX 
py]  >0  for  j  =  1, . ... ,  Ny 

which  eliminates  the  equality  constraints.  Factoring  out  the  norms  since  they 
are  independent  of  the  summation  indices  and  taking  the  natural  logarithm,  In, 
of  the  objective  function  gives 
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NX  Ny 

min  ln(S^Yj>Xipyjf{xuyj))  -  Zn(||Px||i)  -  MIIPylli) 

P*i  I Pyj  ,  .  , 

*=1  J=1 

subject  to  (23) 

Pz,  >0  for  *  =  1, ,  Nx 
Py:i  >  0  for  j  =  1,,.  . . ,  Ny 
For  each  probability  component  variable  Pi  define 


qi  =  ln(pi)  (24) 

where  ln[ 0]  =  —  oo.  Thus  eqi  =  pi  which  transforms  the  multilinear  pro¬ 
gramming  problem  (22)  to 
the  equivalent  problem 

JV,  Ny  JVX  Ny 

min  ln(S ^  V f(Xi,  y A)  —  e9j=i)  —  ln(S^  e9M 

1^iQVj  ,  ,  ,  , 

1=1  J  —  l  1=1  J  — 1 

subject  to  (25) 

qXi  <0  for  i  =  1, . . .,  ATj. 
qyj  <0  for  j  =  1, . . . ,  iVj, 

Assuming  the  function  f(xi,yj)  >  0,  for  all  indices  ?’  and  j,  the  objective 
function  is  a  difference  of  convex  functions.  This  critical  property  is  satisfied  in 
our  object  recognition  applications  because  the  objective  function  is  an  invariant 
metric  defined  by  a  norm  ||  •  ||2  >  0. 

In  this  model  the  objective  function  is  a  difference  of  two  convex  function, 

Nx 

since  is  a  convex  function  and  the  In  operator  preserves  convexity,  and 

1=1 

the  nonpositive  constraints  qXi  <  0  are  convex. 

An  optimization  problem  where  the  objective  function  and  constraints  are 
a  difference  of  convex  functions  is  referred  to  as  a  DC  (difference  of  convex 
functions)  programming  problem. 

The  strategy  in  DC  programming  algorithms  for  such  a  problem  is  to  solve  a 
sequence  of  convex  optimization  problems  where  for  each  problem  one  replaces 
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the  nonconvex  terms,  in  this  case  the  terms  ^Ei=ue9x<]  ancl  ^nEiAj  e<lvj  ]> 
by  their  affine  approximation  which  is  the  first  two  terms  in  the  Taylor  series 
expansion  of  the  function  evaluated  at  the  previous  iterate.  The  constant  term  in 
the  expansion  can  be  dropped  since  any  constant  term  in  the  objective  function 
does  not  change  the  argument  of  the  minimum.  Thus  for  a  general  DC  objective 
function 


z(x)  =  /( x)  -  g( x)  (26) 

one  uses  the  objective  functions 

z(xk)  =  f(xk)-Vx(g)  U— 1Z*  (27) 

where  ar  1  is  an  optimal  solution  to  the  previous  problem  in  the  sequence. 
For  the  first  iterate  k  =  1  any  feasible  point  suffices  for  x°.  This  is  the  strategy 
in  the  primal-dual  subgradient  method  proposed  by  Tao  and  An20.  The  dual 
variables  of  the  qi  variables  in  this  method  for  the  above  problem  are  precisely 
the  probability  components,  .  Such  an  algorithm  applied  to  shape  based 

Z-i=  1  e  * 

recognition  problems  would  use  pruning  (eliminating  variables)  along  the  way 
for  computational  efficiency. 

1  Theory 

1.1  Moving  Beyond  Deterministic  Mappings 

In  the  previous  technical  report  we  discussed  the  “Least  Squares  Recognition 
Problem”  and  its’  application  to  the  Rubiks  cube  problem.  The  “Least  Squares 
Recognition  Problem”  essentially  is  the  model  involving  the  Frobenioiis  norm 
minimization  of  the  error  between  an  Object  O  and  an  image  X,  and  a  trans¬ 
formation  group  Q , 

20Pham  Dinh  Tao,  Le  Thi  Hoai  An.  D.C.  Optimization  Algorithms  for  Solving  the  Trust 
Region  Subproblem.  SIAM  Journal  of  Optimization,  V.18,  (1998)  pp  476-505. 
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min  | \0-  gl\ \%  (28) 

g&S 

Essentially  all  our  recognition  problems  can  be  abstractly  viewed  via  this 
model.  The  process  of  discretization  allows  the  problem  to  be  stated  equivalently 
as 

L 

max  (O,  TT  PmI )  (29) 

m=l 

where  for  layer  l,  Pl  =  is  a  superposition  a  components.  (See  the 

U 

previous  report  for  the  details  here,  i.e.,  f=”face”,  t=” turns”,  etc.) 

This  model  led  to  the  results 


Success  Rale  for  Conjugate  and  Rotational  Transformations 


1 

Figure  29:  Success  rate  of  the  algorithm  for  the  two  methods  involving  search 
directions  of  simple  permutations  and  the  other  using  conjugates  for  the  search 
directions. 


which  are  less  than  spectacular  in  terms  of  finding  the  global  minimum. 
While  using  conjugates  of  rotations  in  the  superpositions  helps,  for  large  layers 
it  too  tends  to  fail  in  finding  the  global  minimum  and  consequently  a  new  idea  is 
required.  We  have  tried  using  penalty  methods  whereby  if  we  obtain  a  nonglobal 
solution  we  simply  add  a  penalty  term  to  the  objective  function  which  forces 
the  algorithm  to  avoid  converging  to  that  same  point  when  the  algorithm  is  ran 
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again  with  the  new  objective  function.  While  this  “works”  it  tends  to  require 
an  excessive  number  attempts  -  it  has  to  repeatedly  add  new  components  to  the 
penalty  component.  In  short  it  ends  up  searching  a  large  portion  of  the  search 
space. 

There  is  a  potentially  better  way  to  address  this  nonglobality  issue  by  ex¬ 
ploiting  an  aspect  of  the  probabilistic  interpretation  which  the  discretization 
introduces.  Namely  there  is  an  underlying  theoretical  aspect  which  we  discov¬ 
ered  previously  which  actually  led  us  to  this  new  least  squares  formulation  and 
the  recognition  that  the  problem  is  in  fact  equivalent  to  the  Maximal  Corre¬ 
spondence  Problem.  That  theorem  is  the  categorical  result  of 


Vdet.  H  'Pprob  (30) 

where  Vdet  is  the  category  of  deterministic  mappings  (essentially  what  we  are 
employing  right  now)  and  Vprob  is  the  category  of  probabilistic  mappings.  Our 
current  model  and  algorithmic  solution  does  not  use  this  fact  outside  the  ob¬ 
servation  that  we  can  embedd  the  (discrete)  combinatorial  problem  into  the 
continuous  domain  of  (the  category  of)  Probabilistic  Mappings.  It  is  this  aspect 
which  we  are  now  trying  to  further  exploit  now  that  we  have  an  understanding 
how  it  translates  into  actual  computations. 

What  we  have  determined  is  that  to  work  in  the  category  of  probabilis¬ 
tic  mappings  is  that  in  the  key  concept  revolves  around  the  superposition 

Pl  =  tPj  t.  The  maps  (matrices)  Pj  t  are  all  permutation  maps  which 

f,t 

are  deterministic.  To  work  in  the  category  of  probabilistic  mappings  we  should 
be  employing  stochastic  matrices  which  are  more  general  than  simple  permu¬ 
tations.  A  stochastic  matrix  is  a  square  P  such  that  each  of  its’  rows  sums  to 
one.  (Permutations  matrices  are  a  special  instance.)  The  problem  with  permu¬ 
tation  matrices  appears  to  be  that  they  require  the  convergence  to  take  place 
along  certain  search  directions  which  “pass  over  hills”  before  reaching  the  valley 
(global  minimum).  By  enlarging  the  search  space  (number  of  search  directions) 
to  include  stochastic  matrices  convergence  to  a  global  minimum  should  be  able 
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to  take  a  “circuitous  route”  to  the  global  minimum.  In  the  Theory  of  Markov 
Processes  the  stochastic  matrices  are  viewed  as  transition  matrices  -  they  al¬ 
low  you  to  transition  from  one  probability  to  another  without  moving  along 
a  deterministic  route.  In  practice  this  approach  fails  because  the  basis  set  is 
extremely  large  and  the  procedure  requires  excessive  computational  time  to  be 
able  to  effectively  search  that  space.  It  is  a  stochastic  method  which  like  all  such 
methods  requires  a  very  large  number  of  attempts  to  find  the  global  minimum 
with  no  quarantee  that  the  result  one  gets  is  indeed  the  global  minimum. 
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1.2  The  Category  of  Probabilistic  Mappings  V 

In  the  category  of  probabilistic  mappings  the  objects  are  measurable  spaces 
(X,B),  where  X  is  a  set  and  B  is  a  cr-algebra  on  X.  A  morphism  between  two 
such  objects 


(X,B)^(Y,B') 


(31) 


is  a  bivariate  function 


T:Xx8'-i[0,l] 

such  that  for  each  fixed  x  £  X,  the  induced  map  defined  by 


(32) 


Tx  :  B'  -  [0,1] 

:  B'  T[x,B') 


(33) 


is  a  probability  measure  on  the  measurable  space  ( Y,B' ),  and  such  that  for 
each  fixed  B'  £  B' ,  the  induced  map  defined  by 


Tb>  :  X  —>  [0,1] 

:  x  i— >  T(x,B’) 


(34) 


is  a  measurable  function  on  the  measurable  space  (A,  B) .  The  composite  of 
two  probabilistic  mappings 


(A,  B)  (Y,  B')  (Z,  B")  (35) 

is  defined  by 

[UoT](x,B")=  [  UB"{y)dTx  (36) 

Jyev 

Observe  that  UB"  is  a  real  valued  function  on  Y, 
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UB"  :  —  [0,1]. 

The  m-  expectation  of  /,  for  /  any  real  valued  function  /  and  m  any  measure, 
is  defined  by 

Em(f)  =  Jfdm.  (37) 

Consequently  the  composite  [U  o  T\(x,B”)  is  the  T^-expectation  of  Ub", 

[UoT](x,B")=Et„(Ub").  (38) 

This  law  of  composition  is  associative. 

Every  measurable  mapping 

(X,B)  (Y,  B') 

may  be  regarded  as  a  probabilistic  mapping 

(X,B)  (Y,B') 

defined  by  the  one  point  measure 

SP(x,-)  :  &  -+  [0,1] 

f  1  If  p(x)£B'  (41) 

:  B'  ^  { 

[  0  If  p{x)  <£  B' 

Thus  Sp  assigns  to  x  the  one-point  measure,  or  dirac  measure,  on  (Y,  £>') 
which  is  concentrated  at  p(x).  Probabilistic  mappings  which  are  one-point  mea¬ 
sures  are  called  deterministic  mappings.  We  use  the  6  notation  to  denote 
all  such  deterministic  mappings,  and  the  subscript  to  denote  the  corresponding 
measurable  mapping  which  corresponds  to  it. 

If  ( X ,  B)  is  a  measurable  space  and  idx  :  X  — >  X  is  the  identity  map  on  X 
as  a  set,  then 

(X,B)S^(X,B)  (42) 


(39) 

(40) 
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is  the  identity  mapping  on  (X,B)  in  V.  With  these  facts,  composition, 
associativity,  and  identity,  it  follows  that  V  is  a  category. 


2  Some  Properties  of  V 

We  enumerate  some  properties  of  the  category  of  probabilistic  mappings  re¬ 
quired  for  modeling  the  recognition  problem. 


1.  If 


(X,B)  -A  (Z,B") 


are  deterministic  mappings  then 


(43) 


Sqop  =  SqO  Sp.  (44) 

With  this  result  we  obtain  the  subcategory  of  deterministic  mappings  Vdet 
which  has  the  same  objects  as  V  but  only  those  arrows  which  are  deter¬ 
ministic.  The  category  Vdet  is  equivalent  to  the  category  of  measurable 
spaces,  often  denoted  by  Meas  or 

Mes. 

2.  A  probabilistic  mapping 


1  (45) 

where  1  is  the  one-point  space  with  the  only  possible  algebra  defined  on  it 
can  be  construed  as  a  probability  measure  on  (A,  B)  since  P  is  (isomorphic 
to)  a  univariate  function, 


P:B ->  [0,1]. 

The  composition  of  a  probabilistic  mappings  T  with  P 
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1  (X,B)  (Y,&)  (46) 

is  defined  as  the  induced  distribution  on  (Y,B').  When  T  is  deterministic 
then  classically  T  is  called  a  “random  variable” . 

p 

Given  that  1  — >  ( X ,  B)  is  just  a  probability  measure  on  ( X ,  B)  it  makes 
sense  to  call  a  probabilistic  map 

(X,B)^(Y,B')  (47) 

a  variable  probability  measure.  This  concept  is  one  of  the  generalizations 
that  category  theory  provides;  we  no  longer  need  consider  just  points 
which  are  maps  whose  domain  is  the  terminal  object  which  is  denoted  by 
1.  Thus  the  probabilistic  mapping  P  above  is  a  point,  while  T  is  referred 
to  as  an  (generalized)  element.  (The  older  terminology  for  generalized 
element  is  variable  element.) 

3.  The  category  V  has  products.  This  follows  from  the  standard  tensor 
product  construction.  Given  probabilistic  mappings  /i  and  v  the  product 
is  given  by 


where  (/j,®  u)  (t,  (B1,B2))  =  n(t,  Bi)i/(t,  B2)  and  the  projections  Sni  and 
5W2  are  the  deterministic  probabilistic  mappings  determined  from  the  canon¬ 
ical  projections 


(Xl,Bi)  -* -  (.X"i  X  ®1  ®2)  - *-  (^2)®2) 

7Ti  7T2 

where  Bi  ®  B2  is  the  product  of  Bi  and  B2  in  the  category  of  measurable 
spaces.  The  commutativity  of  the  biproduct  diagram  follows  from 
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(48) 


{5-nO  n®v)(t,B i)  =  fXixX2  57ri((x1,x2),B1)d(n<E>  v)t 

=  fXlxx2XB1xx2d(fj,0iy)t 

=  (fi®v)t(B1xX2) 

=  n(t,  B±)  v(t, 

=1 

so  Sni  o  =  fi  and  similarly  SW2  o  <g»  v)  =  v. 

Arbitrary  products  are  similar  using  the  standard  construction  for  arbi¬ 
trary  products  in  the  category  of  measurable  spaces. 

3  Further  properties  of  V 

1.  The  inclusion  functor  of  the  subcategory  P^et  c— >  V  has  a  right  adjunct 


defined  by 


i  H  V 


V(X,B)  =  {1$(X,B)} 


which  is  the  set  of  all  probability  measures  on  (A,  B) ,  and  endowed  with 
the  smallest  cr-algebra  such  that  for  each  A  £  £>,  the  evaluation  function 


ev(-,A)  :  V{X)  ->  [0,1] 

:  P  i — r  P(A) 

is  measurable,  when  [0, 1]  has  the  Borel  cr-algebra. 
We  denote  this  cr-algebra  by  VB. 

On  arrows,  for  (X,B)  ( X',B' ),  define 


(DX,  VB) 


VT 

- -  (VX',DB') 


49 


by 


1 


UToP  G A 
otherwise 


[D(T)\(P,A)  = 

The  counit  of  the  adjunction, 

e  :  i  0D-4  Id-p 

is  given  at  component  (X,  B )  by  the  evaluation  probabilistic  mapping 


ev(X,B) 

i(V(X,B))  - -  (X,B) 


where 

ev(x,B)(P,B)  =  P{B). 
This  is  a  natural  transformation  because 


(X,B) 

T 

(. X',B ') 
commutes  since 


i(V(X,B)) 

VT 

i(V(X,B)) 


eV(X,B) 


(X,B) 


T 


ei\x>,B') 


( X',B ') 


fx  n^£)  d[ev{XtB)(P,-)\ 

Jim*in^00{Er=i  “«rB,(»')j 

limitn-, oo  {EIU  aiev(Pi  T(-,  B'Y1{a\) } 
limitn-yoo  {£ILi  atP  (T(-,  B')_1(a*)) } 

6*  some  real  value 


whereas 
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V  o  eV(X',B')(P,  B')  =  fx,  ev(x>, &')(’’ B') 


d[UT(P,  •)] 


limits oo  {Ej=i  bjV(T)(P,  ev(X', &')(-,  B ,)_1(^)} 


But 


,  .  1  iffPoTGet>jr/(-,B,)“1(^) 

V(T)(P,evx,(;B')~1m)=  >  ■  K>} 


0 


otherwise 


=  < 


1  iff  (PoT){B')  &b) 

=  fx,T(-,B')dP 

=  limitn^o o  {S"=1  aip  (T'(-,B,)_1(a|))} 

=  6* 

0  otherwise 

Thus  at  each  fixed  m,  there  exist  only  one  nonzero  term 

b//  (the  index  k  depends  upon  the  index  m)  such  that  &*  €  [bk  —  e/2,  bk 
e/2).  Thus,  using  the  fact  the  e  intervals  are  disjoint, 

m 

J2bj'D(T)(P,evx>(;B')-1m=b^ 

3= 1 

and  consequently 

(T>(T)  o  evx>)(P,  B ')  =  limitn^00{bk'}  =  6* 


2.  The  coproduct  of  a  collection  of  objects  {(X,;,  is  given  by 


(UiejXj,<UBi» 
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where  it  is  the  inclusion  map  into  the  disjoint  union  of  the  sets,  and  (U Bf) 
is  the  cr-algebra  generated  by  the  disjoint  union  of  all  the  component  a- 
algebras.  The  inclusion  map  is  a  measurable  function  with  respect  to 
this  algebra  on  the  set  U i^iXi.  Each  such  inclusion  Li  has  a  measurable 
retraction  ry;  choose  an  element  x j*  and  define 


L'i  • 

:  y 


x. 


y  if  y€  Xt 
Xi *  otherwise 


This  is  measurable  since 


r  ( 


ri\Bi)  =  { 


Bi  U 


Bi 


U  Xi 

j&I 

\  i/*  / 


ifxi*  e  Bi 


otherwise 


When  challenged  with  a  family  of  measurable  mappings  the  unique 

mapping  9  making  the  diagram 


( Y,C ) 


(OieIXi,  (OBi)) 


commute  is  given  by 


0  —  OC’i  O  • 


3.  V  has  coequalizers. 

Consider  the  two  parallel  arrows 
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(X,B)  ==£(X',B') 

Define 

B"  =  {B[  G  B'\ix  G  X  S( x,  S')  =  T(ar,  S')  and 

%B{ns;.)  =  T(i,Bfn5J.)} 

Lemma.  The  set  B"  is  a  sub  er-algebra  of 
B' . 

Proof,  (i)  0,X  G  B"  since  Sx(0)  =  0  =  Tx(0)  and  Sx(X)  =  1  =  Tx(X). 

(ii)  Suppose  B'  G  B"  so  SX{B')  =  TX{B').  Then  since 

1  =  Sx(X)  =  SX(B')  +  SX{B'C) 

=  TX{B')  +  SX{B'C) 

from  which  it  follows 

1  -Tx(B')  =  SX{B'C) 

TX{B'C )  =  SX(B'C ) 

hence 

B'c  G  B". 

(iii)  Let  {B-  G  B"}iei.  Then 

S^B'UB'j)  =  SX{B[)  +  Sx(B'j)-  SX{B[  n  S') 

=  Tx(Bl)  +  Tx(B')-Tx(B'nB’) 

=  T^UBj) 

By  iteration,  for  any  finite  integer  n,  5X(U”=1B-)  =  TX(U”=1-B^).  Since 
SX(U ieiBD  is  bounded  from  above, 

5x(Uie/B-)  =  limitn^00{Sx(Ul=1B')} 

=  limitn^oo  {Tx  (Uf=1  i?  ■ ) } 

=  Tx(Ui£iB'i) 
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qed. 


We  can  now  show  that  the  coequalizer  of  the  parallel  pair  S  and  T  is  given 
by  the  deterministic  mapping  Sidx, 


(X,B) 


S 

=^(X',ff) 


(. X',B ") 


which  is  the  restriction  to  sub  cr-algebra  B" . 

Given  any  other  probabilistic  mapping  U  satisfying  U o S  =  U oT  we  must 
show  there  exist  a  unique  mapping  6  making  the  diagram 


(X,B) 


{Y,C) 

\e 


(X',B") 


commute.  Define 

e(x,C)  =  U(x,  C). 


From  this  it  follows  that 


(0o6id)  (x,C) 


fx>  dsid 

limitn^oc>{J2i=1  a<X9-i(of)} 
limitn — ^(*,  G)  cq)  } 

" - V - ' 

f  1  iff  9(x,C)  G  a\ 

I  0  otherwise 

9(x,  C) 

U(x,C) 
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This  proves  existence.  Uniqueness  employs  the  same  argument;  simply 
replace  6  with  another  mapping,  say  77,  making  the  diagram  commute  and 
one  concludes  that  77  =  U  =  9. 

4.  By  the  previous  two  results,  V  has  all  colimits. 

5.  The  object  1  is  a  seperator  in  V. 

Proof.  Suppose  the  two  parallel  arrows 

(X,B)  =^=*  (Y,  C) 

R 


are  distinct,  say 


T(x,  C)  =  a*  ^  6*  =  R(x,  C). 
Then  the  one  point  measure 


1 - ^  (X,B) 


seperates  the  pair  since  (T  o  SX)(C)  =  a*  while  (R  o  (5a,)(C)  =  6*. 

6.  The  category  V  is  concrete. 

Proof.  We  must  prove  that  there  exist  a  faithful  functor 
V  — >  Set. 

We  have  the  adjunction 

Pis 

Set  -  -  Vdet  Dis  H  U 

U 
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where  Dis  assigns  the  discrete  er-algebra  to  a  set  X  and  U  is  the  “for¬ 
getful”  functor  defined  on  the  objects  of  the  deterministic  subcategory 
'Pdet  by  U((X,C))  =  X  and  on  the  arrows  by  U(Sf)  =  f.  The  unit  of 
the  adjunction  is  the  identity  arrow  and  the  existence  and  uniqueness  re¬ 
quirements  given  an  arbitrary  (set)  function  /  :  X  — >  Y  follows  from  the 
commutative  diagram 


U((X,VX)) 
U(Sf) 


(X,VX) 
8f 


U((Y,B))  ( Y,B ) 

which  proves  that  the  identity  mapping  is  a  universal  arrow  from  the  object 
X  to  the  functor  U.  Thus  the  unit  of  the  adjunction,  rj  :  Id  — >  U  o  Dis  is 
defined  componentwise  by  rjx  =  idx ■  Consider  the  pair  of  adjunctions 


Dxs  i 

Set.  •»  -  Vdet —  *  V  i  o  Dis  H  U  o  V 

U  V 


Both  the  functors  V  and  U  are  faithful;  hence  so  is  their  composite. 


4  Probability  Measures  and  the  Part /Whole  Re¬ 
lation 

The  objects  we  need  to  consider  can  be  viewed  as  a  collection  of  point  features. 
Previously  we  have  considered  an  object  to  consist  of  /e-points  in  M3.  What  we 
really  require  is  that  the  object  be  considered  as  a  collection  of  N  parts,  with 
each  part  consisting  of  fc,  points  in  R3. 

Let  M  stand  for  model\  formally,  it  is  just  a  set  M  C  M3.  It  consists  of  a 
collection  of  parts 


m.j  c 


M 
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where  Lj  is  set  inclusion,  and  the  sets  m,  represent  points,  lines,  surfaces, 
etc.  Endow  M  with  the  cr-algebra  generated  by  {B  fl  M\B  €  £>k 3}  and  its’ 
collection  of  parts  {'i'nJ}^\1 ,  where  BR 3  is  the  Borel  cr-algebra  on  JR3.  Denote 
this  cr-algebra  by  Bm ■  Thus  (M,  Bm)  is  a  measurable  space  and  hence  an  object 
of  V. 

Because  the  parts  nij  are  measurable  sets  it  follows  that  the  sub  cr-algebra 


Bj={Bnmj\BeBM}  (49) 

is  a  tr-algebra  on  m.j,  and  so  each  {mj,Bj)  is  an  object  of  V,  and  each 
inclusion  map  is  a  measurable  map,  and  hence  a  deterministic  mapping  in  V 


uh 

(rrijjBj)  c - -  (. M,Bm )• 

If  P\  and  P2  are  probability  measures  on  these  parts 

(m  1,  Si) 


(m2,  B2) 


then  both  6tl  o  Pi  and  SL2  o  P2  are  probability  measures  on  M.  Furthermore 
any  convex 

combination  of  these  two  probability  measures  on  M  is  also  a 
probability  measure  on  M 


Oi(SLl  0  Pi)  +  02(Sl2  o  P2) 


(M,B) 


81 +  e 2  =  1,  Oi  >  0. 
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The  scalars  0j’s  are  the  “weights”  on  the  respective  parts.  Such  a  convex 
combination  of  probability  measures  on  the  “whole  object”  M  in  terms  of  its’ 
parts  ?n,;  is  very  appealing  for  the  recognition  problem.  The  individual  proba¬ 
bilities  on  the  parts  to*  can  be  calculated  as  in  the  applications  section,  say  by 
the  Monte  Carlo  method  as  mentioned,  and  then  the  probability  on  the  whole 
can  be  obtained  as 


Eti  (5,oPl) 


(M,B) 


Eili  0i  =  h 


9i  >  0. 


The  challenge  resides  is  in  choosing  the  weights  Qi  in  a  reasonable  fashion. 


5  Sheaf-Theoretic  Aspects 

A  convex  combination  of  probability  measures  P\  and  P2  as  in  the  preceding 
expression  does  not  take  into  account  the  fact  that  m\  and  m2  may  intersect.  To 
construct  amalgamations  which  take  into  account  intersections  it  is  necessary 
to  embed  V  into  a  slightly  larger  category  which  allows  scalar  endomappings 
defined  as 

(X,B)^(X,B) 

defined  by 

%  =  XxB  [0, 00) 

f  c  if  x  €  B 
:  ( x,B )  ^  < 

I  0  otherwise 

As  the  notation  suggest  these  scalar  mappings  are  deterministic,  induced  by 
the  identity  maps,  but  of  total  measure  c.  It  is  convenient  to  consider  probability 
measures  defined  on  the  coproducts 


1 


Pi 


1  +  ( m,i,Bi ) 
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where  P(  1)  =  0. 
Consider  the  diagram 


1  +  rrii 


1  +  M 

1  +  ?n2 

where  we  have  dropped  explicit  mention  of  the  cr-algebras  and  where  we  have 
redefined  i  to  account  for  the  new  element 
*  €  1  +  rrij, 

x  \/x  €  rrij 
*  x  =  * 

Each  measurable  mapping  l:]  has  a  retraction  Sj 

Lj 

1  +  ITlj  ■*  *  1  +  M  Sj  O  Lj  =  idi+m  . 


If  x  €  nij 
otherwise 

In  the  category  of  measurable  spaces,  let 

1  +  771 1 


1  +  M 


1  +  ?772 


which  is  measurable  and  defined  by 


sj(x)  = 
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be  the  pullback  of  i\  and  (-2-  Then  the  inclusion  maps  l  +  (miDm2)  >  1+my 
both  have  a  measurable  retraction  r:j  defined  by 


rj(x) 


x  If  x  £  mi  n  m2 
*  otherwise 


In  V ,  consider  the  diagram 


Suppose  Pi  and  P2  agree  on  m\  fl  m2, 


P\2  =  d'r-i  °  -Pi  =  $r2  O  P‘2 • 
Define  (the  nonprobabilistic  mapping) 


P  —  O  Pi  T  ^  P2  ^l2°P2°^2  O  P2 
Then  the  mapping 

P  =  6tPd(1+M)  o  P  :  1  ->  1  +  M. 

is  a  probabilistic  mapping  which  is  the  amalgamation  of  Pi  and  P2.  It  is  of 
measure  one  because  of  the  scaling;  it  satisfies  finite  additivity  because  each 
component  term  does. 

The  restriction  of  P  to  1  +  mi,  defined  as  P|i+mi  =  5Sl  o  5fd1+M'>  o  P,  gives 
P|l+mi  =  °Pl  T  ou2  0  P2  ^s1ot,2op2or2  °  P2 

^ - v - '  ' - v - - - ' 

—id  =0 

=  Pi 


60 


where  the  latter  two  terms  cancel  each  other  out.  This  can  be  verified  by 
looking  at  these  terms  on  the  generating  elements  of  the  cr-algebra  of  1  +  M  and 
the  subalgebras: 

Consider  the  event  B  n  mi,  where  B  €  Bm, 

(6SiOL2o  P2)(Bnm1)  =  J1+m26Sl0t,2(-,Br\m1)dP2 

=  h  ^kh2  (•,  B  n  m\)dP2  +  fminm2  Ssi ot2(-,  B  n  mi)dP2 

+  fmc  <*>Siol2(-,B  nm1)dP2 

=  o  +  p2(b  n  mi  n  m2)  +  o 

where,  as  usual,  for  any  measurable  Z  C  1+rni,  Jz  f(-)dPi  =  f1+mi  Xzf{-)dP\- 
Decomposing  the  integral  for  (SSloL2op2os2  0  B2)(B)  into  three  components 
shows  it  is  equal  to  P2(B  (Imi  fl  m2). 

This  same  procedure  works  for  any  finite  number  of  parts  {m,;}”=1  giving 
an  amalgamation  with  the  desired  restriction  properties.  The  more  familiar 
functorial  view  is  given  by  employing  the  composite  functor 

UV  :  V  ->  Set 

■■ob  (X,B)  VX 

:ar  (X,B)  ^  VX^VY 

where 

[D(T)\(P)  =  ToP 

Then  given  a  finite  cover  {mi}”=1  of  M,  and  a  matching  compatible  family  of 
elements  {Pi  €  (WT,)(mi)}”=1  there  exist  a  unique  amalgamation  P  €  UV(M) 
(constructed  as  above)  satisfying  the  restriction  property. 

6  Gluing  and  Restrictions  of  Probabilities. 

Suppose  we  are  working  in  a  topos  C  with  a  subobject  classifier  O.  Consider 
the  diagram 
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where  fi  and  /2  are  parts  of  X  (monies).  Taking  the  pullback  diagram  gives 


There  are  two  covariant  functors  of  interest:  The  functor 


g  :  C  Set 

A  P 

:0b  A  — >  {O'4  — ■>  [0, 1]  :  P  a  probability  measure} 

:ar  A  -4  B  i — ►  {O*4  -4  [0, 1]  :P  a  p.m.}  ^4  ({0s  -5-  [0, 1]  :P  a  p.m.} 

(50) 

where  O'  is  the  power  object  functor  (contravariant  horn  functor  hom(-,  O)  map¬ 
ping  the  objects  and  arrows  into  the  category  of  sets,  where  for  B  O,  with 
transpose  (adjunct)  1  0s,  0 f(P  o  t)  =  P  o  t  o  /. 

The  second  functor  of  interest  is  the  contravariant  functor 
T  ■.  C  Set 

:0b  A  i— >  {O'4  4[0,l]:Pa  probability  measure}  (51) 

:  P^4  i— >  o  im f 

where  im.  is  the  direct  image  functor.  (The  image  functor  goes  by  many  names. 
Another  common  symbol  is  3..)  Both  functors  T  and  Q  have  the  same  mapping 
on  objects.  Because  of  the  relationship  between  the  covariant  imaging  functor 
im.  and  the  contravariant  power  object  functor  O'  we  obtain  the  diagram 
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The  Beck-Chevalley  Theorem  says  that 

iV2  o  im,f1  =  imP2  o  fiPl 


(52) 


It  is  this  “commutativity  condition”  which  gives  rise  to  the  sheaf  theoretic 
aspects  which  we  now  describe. 

Suppose  Pi  and  P2  are  two  separate  probability  measures  on  A\  and  A 2 
respectively.  Hence  they  are  maps 


Pi  :  -  [0,1] 

P2  :  flA2  ->  [0,1] 


(53) 


(More  generally  Pi  and  P2  can  be  any  measures  on  any  boolean  subalgebra  of 
the  power  objects  O'41  and  flA2  respectively.)  If  these  two  measures  Pi  and  P2 
agree  on  Hi  fl  A2  then 

Pl  o  impi  =  P2  o  imP2 .  (54) 

Both  of  these  quantities  are  the  restriction  maps  -  the  restriction  of  P,  on  At  to 
A\  n  A2  for  i  =  1,2.  From  the  above  diagram  we  obtain  the  desired  amalgama¬ 
tion  of  Px  of  Pi  and  P2  as 


Px  =  Pi  O  fip  +  P2  o  nf2  -  P2  o  irnP2  O  nflC]f2 .  (55) 
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Because  /i  is  a  part  of  X,  the  restriction  of  Px  to  A\  is 


Px  °  =  P\  o  &1  o  tm/j  +  P2  o  o  im.f1  —P^o  imP2  o  iVin^2  o  irrif1 .  (56) 

Using  the  three  properties 
flfi  o  im.fi  = 

fi/in/2  =  qpi  o  ^/i  =  QV2  0  q/2  (57) 

Pi  o  imPl  =  P2  o  imP2  Matching  Condition 
along  with  the  Beck-Chevalley  Condition 

nh  o  imji  =  imP2  o  Qpl  (58) 


reduces  the  above  equation  to 

Px  o  im f2  =  Pi  +  Pi  o  imPl  o  UPl  —  Pi  o  imPl  o  UP1 
=  Pi 

Similarly  because  /2  is  a  part  of  X  it  follows  that 


(59) 


Px  °imf2  =  Pi¬ 


rn) 


which  proves  the  amalgamations  restrict  to  the  parts  of  which  it  is  composed. 
This  proof  extends  by  induction  to  an  arbitrary  covering  of  parts  {/*}"=» ji  The 
restriction  that  the  covers  be  parts  can  be  removed  by  using  the  image  of  each 
cover  fi.  Consequently  there  is  no  loss  in  generality  by  assuming  the  cover  to 
consist  of  monic  maps  fi . 

Conversely,  given  a  cover  {/*}”_!  and  a  measure  Px  on  X,  the  restriction  of 
Px  to  At  is  given  by 

Pi  =  Px.oiirifi  (61) 

which  is  a  measure  on  Ai  because  Px  is  a  measure.  When  the  maps  fi  are 
monic  then  irri  f.  acts  as  the  inclusion  map.  The  quantity  Pi  can  be  made  into 
a  probability  measure  by  the  appropriate  scaling. 
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6.1  Summary 

The  point  to  surface  matching  algorithm  is  a  practical  approach  to  solving  the 
ATR  problem  because  unlike  its’  predecessor,  using  point  to  point  matching 
of  “features” ,  it  circumvents  the  combinatorial  explosion  that  occurs  with  such 
methods.  The  representation  of  objects  by  CAD  models  is  seemingly  obvious, 
yet  the  representation  of  objects  by  a  collection  of  points  is  still  used  in  some 
quarters.  This  is  an  unnecessary  complication  and  makes  the  feasibility  of  such 
approaches  impractical  for  real-time  computations.  The  feasibility  of  the  al¬ 
gorithm  used  to  solve  the  point  to  surface  model  resides  in  the  fact  that  we 
can  “collapse”  the  objective  function  -  which  is  none  other  than  the  weighted 
sum  of  every  possibility  with  respect  to  the  discretization  -  by  exploiting  the 
bilineararity  of  the  inner  product.  Because  the  constraints  associated  with  this 
model  are  all  linear  and  decoupled  the  computations  of  the  optimization  algo¬ 
rithm  (search  direction  and  line  search)  can  be  carried  out  in  parallel  for  each 
of  the  unknown  parameters.  With  hindsight  this  approach  seems  elementary 
and  obvious.  Unlike  other  ad  hoc  approaches  (such  as  the  Iterated  Closest 
Point  Algorithm)  it  has  a  theoretical  basis  behind  it.  Indeed  our  derivation  of 
it  came  from  studying  the  deterministic  subcategory  of  the  category  of  proba¬ 
bilistic  mappings  which  provides  a  solid  foundation  for  the  approach  as  well  as 
any  future  studies/extensions  of  this  approach.  Using  the  category  of  proba¬ 
bilistic  mappings  a  sheaf-theoretic  view  of  the  problem  can  be  taken  but  is  not 
necessary  from  the  computational  point  of  view. 

The  computational  time  of  the  algorithm  depends  upon  the  number  of  image 
points  and  the  number  of  surfaces  modeling  the  object  (“the  resolution”  of  the 
CAD  model).  On  a  Mac  Quad  Pro  with  4  processors  the  computation  time 
for  our  standard  test  case  -  the  tank  with  206  surfaces  and  the  point  cloud 
consisting  of  300  image  points  -  is  about  2  minutes.  By  using  GPCA  this  can 
be  reduced  an  order  of  magnitude. 

The  results  of  the  output  of  the  algorithm  for  classification  purposes  are,  like 
any  other  algorithm,  heavily  dependent  upon  the  data  input  to  the  algorithm. 
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Namely  the  image  data  (point  cloud  of  data)  must  cover  a  sufficient  number 
of  surfaces  which  “distinquish”  the  true  object  from  any  decoys.  Given  data 
from  a  side  view  or  directly  overhead  is  an  extreme  case  in  which  it  is  very 
difficult  to  make  proper  classification  because  the  data  falls  on  surfaces  which 
all  lie  in  hyperplanes  which  are  similar,  i.e.,  have  similar  surface  normals).  On 
the  other  hand,  data  taken  from  an  aircraft  ladar  at  30  —  60  degrees  tends  to  be 
sufficient  provided  occlusion  does  not  block  the  view  of  many  surfaces  defining 
the  object.  This  algorithm  is  being  used  in  a  SBIR,  in  joint  effort  with  Etegent 
Technologies,  to  further  its  development  and  integrate  it  with  the  preprocessing 
of  the  raw  image  data  to  produce  an  overall  ATR  system. 

A  copy  of  the  Mathematica  code  is  included  with  this  report.  It  is  modu¬ 
lar  and  documented  sufficiently  to  be  able  to  follow  and  use  for  experimental 
purposes. 
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7  Financial  Report 


The  Table  below  shows  the  expenditures  on  a  running  monthly  basis.  The 
graph  following  it  shows  represents  the  block  funding  on  a  yearly  basis,  the 
green  graph  is  the  estimated  monthly  expenditures,  and  the  red  graph  is  the 
actual  monthly  expenditures.  The  estimated  expenditures  were  based  upon 
monthly  linear  spending  of  available  funding.  Note  we  did  not  start  charging 
until  March  2005  so  the  first  billing  is  in  the  6th  month  of  the  first  fiscal  year. 

There  were  no  travel  expenses  or  material  charges;  all  charges  were  labor 
expenses. 


Table  1.  Quarterly  Program-to-Date  Expenses 
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Expenditures  vs  Funding 


month 


Figure  30:  Funding  -  Expenditure  Profile 
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